A Fast Algorithm for Making Su x Arrays and for Burrows-Wheeler Transformation

نویسنده

  • Kunihiko Sadakane
چکیده

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare algorithms for making su x arrays of Bentley-Sedgewick, Andersson-Nilsson and Karp-Miller-Rosenberg and making su x trees of Larsson on speed and required memory and propose a new algorithm which is fast and memory e cient by combining them. We also de ne a measure of di culty of sorting su xes: average match length. Our algorithm is e ective when the average match length of a text is large, especially for large databases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

متن کامل

Constructing Su x Arrays of Large Texts

Recently, Sadakane [12] proposes a new fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called sufx array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sor...

متن کامل

A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation

A new text database management method for distributed cooperative environments is proposed, which can collect texts in distributed sites through a network of narrow bandwidth and enables fulltext search in a uni ed e cient manner. This method is based on the two new developments in full-text search data structures and data compression. Speci cally, the Burrows-Wheeler transformation is used as ...

متن کامل

Approximate Pattern Matching Over the Burrows-Wheeler Transformed Text

The compressed pattern matching problem is to locate the occurrence(s) of a pattern P in a text string T using a compressed representation of T , with minimal (or no) decompression. In this paper, we consider approximate pattern matching directly on Burrow-Wheeler transformed (BWT) text which is a critical step for a fully compressed pattern matching algorithm on a BWT based compression algorit...

متن کامل

Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT)

MOTIVATION Over the last few years, methods based on suffix arrays using the Burrows-Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the dataset being searched. Meanwhile, algorithmic development for genotype data has concentrated on stati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998